Total Reward Stochastic Games and Sensitive Average Reward Strategies
نویسندگان
چکیده
منابع مشابه
Average Reward Timed Games
We consider real-time games where the goal consists, for each player, in maximizing the average reward he or she receives per time unit. We consider zero-sum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played on discrete-time game structures which can be specified using a two-player version of timed automata whose locations are ...
متن کاملLearning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games
A large class of sequential decision making problems under uncertainty with multiple competing decision makers can be modeled as stochastic games. It can be considered that the stochastic games are multiplayer extensions of Markov decision processes (MDPs). In this paper, we develop a reinforcement learning algorithm to obtain average reward equilibrium for irreducible stochastic games. In our ...
متن کاملSensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning
Research in reinforcement learning (RL) has thus far concentrated on two optimality criteria: the discounted framework, which has been very well-studied, and the average-reward framework, in which interest is rapidly increasing. In this paper, we present a framework called sensitive discount optimality which ooers an elegant way of linking these two paradigms. Although sensitive discount optima...
متن کاملReinforcement Learning for Average Reward Zero-Sum Games
We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the second on Q-learning for stochastic shortest path games. Convergence is proved using the ODE (Ordinary Differential Equation) method. We further discuss the case where not all the actions are played by the opponent with comparab...
متن کاملReward Ideas and Strategies
Reward system can be considered the center of our body-mind equilibrium: we have medial reward, leading life, reproduction, needs, and willings; if this is not adequatly satisfied, dopamine is converted by dopa-beta-hydroxilase into catecolamines, and habenula regulates the balance so that lateral reward risk to become prevalent, with activation of HPA axis (hypothalamus-pituitary gland-adrenal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Optimization Theory and Applications
سال: 1998
ISSN: 0022-3239,1573-2878
DOI: 10.1023/a:1022697100194